Estimating Continuous Distributions in Bayesian Classifiers
نویسندگان
چکیده
When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous vari ables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality as sumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experi mental results on a variety of natural and ar tificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparamet ric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models. 1 Introduction In recent years, methods for inducing probabilistic de scriptions from training data have emerged as a major alternative to more established approaches to machine learning, such as decision-tree induction and neural networks. For example, Cooper & Herskovits (1992) describe a greedy algorithm that determines the struc ture of a Bayesian inference network from data, while Heckerman, Geiger & Chickering (1994), Provan & Singh (1995), and others report advances on this ba sic approach. Bayesian networks provide a promising representation for machine learning for the same rea sons they are useful in performance tasks such as di agnosis: they deal explicitly with issues of uncertainty and noise, which are central problems in any induction task. However, some of the most impressive results to date have come from a much simpler-and much older-approach to probabilistic induction known as the naive Bayesian classifier. Despite the simplifying assump tions that underlie the naive Bayesian classifier, exper iments on real-world data have repeatedly shown it to be competitive with much more sophisticated induc tion algorithms. For example, Clark & Niblett (1989) report naive Bayes producing accuracies comparable to those for rule-induction methods in medical domains, and Langley, lba & Thompson (1992) found that it outperformed an algorithm for decision-tree induction in four out of five domains. These impressive results have motivated some re searchers to explore extensions of naive Bayes that lessen dependence on its assumptions but that retain its inherent simplicity and clear probabilistic seman tics. Langley & Sage (1994) describe a variation that mitigates the independence assumption by eliminat ing predictive features that are correlated with others. Kononenko (1991) and Pazzani …
منابع مشابه
Possibilistic classifiers for numerical data
Naive Bayesian Classifiers, which rely on independence hypotheses, together with a normality assumption to estimate densities for numerical data, are known for their simplicity and their effectiveness. However, estimating densities, even under the normality assumption, may be problematic in case of poor data. In such a situation, possibility distributions may provide a more faithful representat...
متن کاملConditional Log-Likelihood for Continuous Time Bayesian Network Classifiers
Continuous time Bayesian network classifiers are designed for analyzing multivariate streaming data when time duration of events matters. New continuous time Bayesian network classifiers are introduced while their conditional log-likelihood scoring function is developed. A learning algorithm, combining conditional log-likelihood with Bayesian parameter estimation is developed. Classification ac...
متن کاملBayesian Networks with Imprecise Probabilities: Theory and Application to Classification
Bayesian network are powerful probabilistic graphical models for modelling uncertainty. Among others, classification represents an important application: some of the most used classifiers are based on Bayesian networks. Bayesian networks are precise models: exact numeric values should be provided for quantification. This requirement is sometimes too narrow. Sets instead of single distributions ...
متن کاملThe Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data
The goal of this study is to introduce a statistical method regarding the analysis of specific latent data for regression analysis of the discrete data and to build a relation between a probit regression model (related to the discrete response) and normal linear regression model (related to the latent data of continuous response). This method provides precise inferences on binary and multinomia...
متن کاملAugmented Naive Bayesian Classifiers for Mixed-Mode Data
Conventional Bayesian networks often require discretization of continuous variables prior to learning. It is important to investigate Bayesian networks allowing mixed-mode data, in order to better represent data distributions as well as to avoid the overfitting problem. However, this attempt imposes potential restrictions to a network construction algorithm, since certain dependency has not bee...
متن کاملEstimating Continuous Distributions in Bayesian Classi ers
When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive...
متن کامل